1 GCCAGCTGGG GTTACTTTAA AAAACATGCT CCATGTGCAT CCCTCTTGAA 
51 GCTTCGCACT CTGTTGAAGA GGACACTCAT CCCAGTCATT ATTTAGAAGC 
101 AAGGTCCTTG AATGAGCGAG ATTATCGGGA CCGGAGATAC GTTGACGAAT 
151 ACAGGAATGA CTACTGTGAA GGATATGTTC CTAGACATTA TCACAGAGAC 
201 ATTGAAAGCG GGTATCGAAT CCACTGCAGT AAATCTTCAG TCCGCAGCAG 
251 GAGAAGCAGT CCTAAAAGGA AGCGCAATAG ACACTGTTCA AGTCATCAGT 
301 CACGTTCGAA GAGCCACCGA AGGAAAAGAT CCAGGAGTAT AGAGGATGAT 
351 GAGGAGGGTC ACCTGATCTG TCAAAGTGGA GACGTTCTAA GAGCAAGATA 
401 TGAAATCGTG GACACTTTGG GTGAAGGAGC CTTTGGCAAA GTTGTAGAGT 
451 GCATTGATCA TGGCATGGAT GGCATGCATG TAGCAGTGAA AATCGTAAAA 
501 AATGTAGGCC GTTACCGTGA AGCAGCTCGT TCAGAAATCC AAGTATTAGA 
551 GCACTTAAAT AGTACTGATC CCAATAGTGT CTTCCGATGT GTCCAGATGC 
601 TAGAATGGTT TGATCATCAT GGTCATGTTT GTATTGTGTT TGAACTACTG 
651 GGACTTAGTA CTTACGATTT CATTAAAGAA AACAGCTTTC TGCCATTTCA 
701 AATTGACCAC ATCAGGCAGA TGGCGTATCA GATCTGCCAG TCAATAAATT 
751 TTTTACATCA TAATAAATTA ACCCATACAG ATCTGAAGCC TGAAAATATT 
801 TTGTTTGTGA AGTCTGACTA TGTAGTCAAA TATAATTCTA AAATGAAACG 
851 TGATGAACGC ACACTGAAAA ACACAGATAT CAAAGTTGTT GACTTTGGAA 
901 GTGCAACGTA TGATGATGAA CATCACAGTA CTTTGGTGTC TACCCGGCAC 
951 TACAGAGCTC CCGAGGTCAT TTTGGCTTTA GGTTGGTCTC AGCCTTGTGA 
1001 TGTTTGGAGC ATAGGTTGCA TTCTTATTGA ATATTACCTT GGTTTCACAG 
1051 TCTTTCAGAC TCATGATAGT AAAGAGCACC TGGCAATGAT GGAACGAATA 
1101 TTAGGACCCA TACCACAACA CATGATTCAG AAAACAAGAA AACGCAAGTA 
1151 TTTTCACCAT AACCAGCTAG ATTGGGATGA ACACAGTTCT GCTGGTAGAT 
1201 ATGTTAGGAG ACGCTGCAAA CCGTTGAAGG AATTTATGCT TTGTCATGAT 
1251 GAAGAACATG AGAAACTGTT TGACCTGGTT CGAAGAATGT TAGAATATGA 
1301 TCCAACTCAA AGAATTACCT TGGATGAAGC ATTGCAGCAT CCTTTCTTTG 
1351 ACTTATTAAA AAAGAAATGA AATGGGAATC AGTGGTCTTA CTATATACTT 
1401 CTCTAGAAGA GATTACTTAA GACTGTGTCA GTCAACTAAA CATTCTAATA 
1451 Mill GT AAA CATTAAATTA TTTTGTACAG TTAAGTGTAA ATATTGTATG 
1501 TTTTGTATCA ATAGCATAAT TAACTTGTTA AGCAAGTATG GTCTTGATAA 
1551 TGCATTAGAA AAATTAAAAT TAATTTTTCT TTTTGAAATT ACCA I I I I I A 
1601 AATACCTTTG AAATATCCTT TGTGTCCAGT GATAAATGTG ATTGATCTTG 
1651 CCTTTTGTAC ATGGAGGTCA CCTCTGAAGT GAI I I I II I I GAGTAAAAGG 
1701 AAATCTTGAC TACTTTATAT TCTTAAAGGA ATATTCTTTA TATACTTCAA 
1751 ATTTAGAACT TAACTTTAAA AG I I I I I CTT CTGTAATTGT TGAACGGGTG 
1801 ATTATTATTA ACTCTAGATA AGCAGGTACT AGAAACCAAA ACTCAGAAAA 
1851 TGTTTACTGT TAGAATTCTA TTAAATTTTA AGTGTTGTAT TCI MM CAT 
1901 TGGGTGATGT CAGGGTGATA ACCAGACATT CATGGAAAGG CATGCAGTTT 
1951 GTCCATTGTG ACAGTTTGTT TAATAAAACC ACATACACAC TTTATTTAAG 
2001 ATTAAAATCT AACTGGAAAG TCAGCTTGGA AAATGGACAT TTCCAAGTAT 
2051 GTTTGGTGAG TCACAGATAT AAAAATAGAA ATTCTGATGA GAGGTTTCAG 
2101 II II IAATAC CAAGTCCTTA GGAGTCTTAA CATTGGCCAG CATCTGTTTA 
2151 TCAAATGACA TAAATACGTA AACCTATAAG AATTAAGTTT ATTAATTAGG 
2201 CAATTTATGT CTGTGATAAT TCTTACGGGA GAAAGAGGAT TTGATTGGAA 
2251 AGCAGTTTGG GAAGAAAGTG CTGCTGAAAT TTCCAGAATT TAATTGATTG 
2301 GTTACATAAA C II I II GACT TCAGAAAAAA AAAATAAAAA AACAAAAAAA 
2351 AAAC 
(SEQ ID NO: 1) 

FEATURES : 

5'UTR: 1-32 

Start Codon: 33 

Stop Codon: 1368 

3'UTR: 1371 
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Homologous proteins: 
Top 10 blast hits: 

Sequences producing significant alignments: 



CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 

EST: 



150000079514205 /altid=gi 
18000005115066 /altid=g?| 
335001098680506 /altid=gi 
335001098687191 /altid=gi 
18000004973971 /altid=g?| 
18000004935844 /altid=gi | 
18000004938713 /altid=gi| 
114000015334919 /altid=qi 
18000004896888 /altid=g? j 
98000043608390 /altid=gi| 



110190706 /def=ref |NP_065717.1| pr. 
6671766 /def=ref |NP_031740.1| CDC . 
11416272 /def=ref |XP_003664.1 si. 
11429914 /def=ref |XP_002520.1 CD. 



4758008 /def=ref 



NP_004062.1| CDC- , 



110864 /def=pir| A39676 protein ki 
125290 /def=sp|P22518|CLKl_MOUSE P. 



9437515 /def=gb 
107458 /def=pir| 
12805489 /def=gb 



AAF87326.1|AF2122. 
A38643 protein ki . 
AAH02220.1|AAH022. 



Sequences producing significant alignments: 



91 
91 
91 
91 
91 
91 
91 
91 



12603368 /dataset=dbest /taxon=96 
2555404 /dataset=dbest /taxon=9606 . 
10341364 /dataset=dbest /taxon=960. . 
3733981 /dataset=dbest /taxon=9606 . 
900131 /dataset=dbest /taxon=9606 /. 
6034370 /dataset=dbest /taxon=9606 . 
2824947 /dataset=dbest /taxon=9606 . 
7318123 /dataset=dbest /taxon=9606. . 
10913732 /dataset=dbest /taxon=96... 



Score 
(bits) 

904 
883 
745 
740 
738 
718 
716 
700 
670 
630 



Score 
(bits) 
785 
712 
549 
450 
432 
424 
396 
381 
335 



EXPRESSION INFORMATION 

library source: 



91 

gi 
gi 

91 
91 

gi 

91 
91 
91 
91 

gi 
gi 



FOR MODULATORY USE: 



line 



12603368 Bone osteosarcoma cell 
2555404 Breast 

10341364 Uterus leiomyosarcoma 

3733981 Fetal heart 

900131 infant brain 

6034370 Col on- juvenile granulose tumor 

2824947 Mixed 

7318123 Colon-moderately differentiatd adenocarcinoma 
10913732 Bone marrow hematopoietic stem cells 

2824947 Pooled human melanocyte, fetal heart, and pregnant uterus 

10088906 nervous_normal 

9093801 leukopheresis myeloid cell 



E 

value 

0 
0 
0 
0 
0 
0 
0 
0 
0 



0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
e-179 



E 

value 
0.0 
0.0 
e-154 
e-124 
e-118 
e-116 
e-108 
e-103 
2e-89 



Tissue expression: 
Leukocyte 
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1 MCIPLEASHS VEEDTHPSHY LEARSLNERD YRDRRYVDEY RNDYCEGYVP 
51 RHYHRDIESG YRIHCSKSSV RSRRSSPKRK RNRHCSSHQ5 RSKSHRRKRS 
101 R5IEDDEEGH LICQSGDVLR ARYEIVDTLG EGAFGKVVEC IDHGMDGMHV 
151 AVKIVKNVGR YREAARSEIQ VLEHLNSTDP NSVFRCVQML EWFDHHGHVC 
201 IVFELLGLST YDFIKENSFL PFQIDHIRQM AYQICQSINF LHHNKLTHTD 
251 LKPENILFVK SDYWKYNSK MKRDERTLKN TDIKVVDFGS ATYDDEHHST 
301 LVSTRHYRAP EVILALGWSQ PCDVWSIGCI LIEYYLGFTV FQTHDSKEHL 
351 AMMERILGPI PQHMIQKTRK RKYFHHNQLD WDEHSSAGRY VRRRCKPLKE 
401 FMLCHDEEHE KLFDLVRRML EYDPTQRITL DEALQHPFFD LLKKK 
(SEQ ID NO: 2) 

FEATURES: 

Functional domains and key regions: 

[1] PDOC00001 PS00001 ASN_GLYCOSYLATION 
N-glycosylation site 

176-179 NSTD 



[2] PDOC00004 PS00004 CAMP_PHOSPHO_SITE 

camp- and cGMP-dependent protein kinase phosphorylation site 

Number of matches: 2 

1 73-76 RRSS 

2 97-100 RKRS 



[3] PDOC00005 PS00005 PKC_PHOSPHO_SITE 
Protein kinase C phosphorylation site 

Number of matches: 8 



1 


69-71 


SVR 


2 


72-74 


SRR 


3 


76-78 


SPK 


4 


94-96 


SHR 


5 


277-279 TLK 


6 


303-305 


STR 


7 


368-370 


TRK 


8 


425-427 


TQR 



[4] PDOC00006 PS00006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 

Number of matches: 8 

1 10-13 SVEE 

2 25-28 SLNE 

3 102-105 SIED 

4 128-131 TLGE 

5 209-212 STYD 

6 247-250 THTD 

7 292-295 TYDD 

8 429-432 TLDE 



[5] PDOC00007 PS00007 TYR_PHOSPHO_SITE 
Tyrosine kinase phosphorylation site 

Number of matches: 3 

1 24-31 RSLNERDY 

2 29-36 RDYRDRRY 

3 55-61 RDIESGY 



[6] PDOC00008 PS00008 MYRISTYL 

N-myristoylation site 

147-152 GMHVAV 
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[7] PDOC00100 PS00107 PROTEIN_KINASE_ATP 
Protein kinases ATP-binding region signature 

129-153 LGEGAFGKWECIDHGMDGMHVAVK 



[8] PDOC00100 PS00108 P ROT E I N_K I N A S E_ST 

Serine/Threonine protein kinases active-site signature 

246-258 LTHTDLKPENILF 

Membrane spanning structure and domains: 
ndidate membrane-spanning segments: 
Helix Begin End Score Certainity 
1 324 344 1.141 Certain 



BLAST Alignment to Top Hit: 

>CRA | 150000079514205 /altid=gi | 10190706 /def=ref |NP_065717.1| protein 
serine threonine kinase Clk4 [Homo sapiens] /org=Homo 
sapiens /taxon=9606 /dataset=nraa /length=481 
Length = 481 

Score = 904 bits (2312), Expect = 0.0 

Identities = 427/427 (100%), Positives = 427/427 (100%) 

Frame = +3 

Query: 87 hylearslnerdyrdrryvdeyrndycegyvprhyhrdiesgyrihcskssvrsrrsspk 266 

HYLEARSLNERDYRDRRYVDEYRNDYCEGYVPRHYHRDIESGYRIHCSKSSVRSRRSSPK 
Sbjct: 55 HYLEARSLNERDYRDRRYVDEYRNDYCEGYVPRHYHRDIESGYRIHCSKSSVRSRRSSPK 114 

Query: 267 RKRNRHCSSHQSRSKSHRRKRSRSIEDDEEGHLICQSGDVLRARYEIVDTLGEGAFGKVV 446 

RKRNRHCSSHQSRSKSHRRKRSRSIEDDEEGHLICQSGDVLRARYEIVDTLGEGAFGKVV 
Sbjct: 115 RKRNRHCSSHQSRSKSHRRKRSRSIEDDEEGHLICQSGDVLRARYEIVDTLGEGAFGKVV 174 

Query: 447 ECIDHGMDGMHVAVKIVKNVGRYREAARSEIQVLEHLNSTDPNSVFRCVQMLEWFDHHGH 626 

ECIDHGMDGMHVAVKIVKNVGRYREAARSEIQVLEHLNSTDPNSVFRCVQMLEWFDHHGH 
Sbjct: 175 ECIDHGMDGMHVAVKIVKNVGRYREAARSEIQVLEHLNSTDPNSVFRCVQMLEWFDHHGH 234 

Query: 627 VCIVFELLGLSTYDFIKENSFLPFQIDHIRQMAYQICQSINFLHHNKLTHTDLKPENILF 806 

VCIVFELLGLSTYDFIKENSFLPFQIDHIRQMAYQICQSINFLHHNKLTHTDLKPENILF 
Sbjct: 235 VCIVFELLGLSTYDFIKENSFLPFQIDHIRQMAYQICQSINFLHHNKLTHTDLKPENILF 294 

Query: 807 VKSDYVVKYNSKMKRDERTLKNTDIKWDFGSATYDDEHHSTLVSTRHYRAPEVILALGW 986 

VKSDYWKYNSKMKRDERTLKNTDIKWDFGSATYDDEHHSTLVSTRHYRAPEVILALGW 
Sbjct: 295 VKSDYWKYNSKMKRDERTLKNTDIKWDFGSATYDDEHHSTLVSTRHYRAPEVILALGW 354 

Query: 987 SQPCDVWSIGCILIEYYLGFTVFQTHDSKEHLAMMERILGPIPQHMIQKTRKRKYFHHNQ 1166 

SQPCDVWSIGCILIEYYLGFTVFQTHDSKEHLAMMERILGPIPQHMIQKTRKRKYFHHNQ 
Sbjct: 355 SQPCDVWSIGCILIEYYLGFTVFQTHDSKEHLAMMERILGPIPQHMIQKTRKRKYFHHNQ 414 

Query: 1167 LDWDEHSSAGRYVRRRCKPLKEFMLCHDEEHEKLFDLVRRMLEYDPTQRITLDEALQHPF 1346 

LDWDEHSSAGRYVRRRCKPLKEFMLCHDEEHEKLFDLVRRMLEYDPTQRITLDEALQHPF 
Sbjct: 415 LDWDEHSSAGRYVRRRCKPLKEFMLCHDEEHEKLFDLVRRMLEYDPTQRITLDEALQHPF 474 

Query: 1347 FDLLKKK 1367 
FDLLKKK 

Sbjct: 475 FDLLKKK 481 (SEQ ID NO:4) 
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>CRA 1 18000004973971 /altid=gi 14758008 /def=ref | NP_004062 . 1| CDC-like 
kinasel; CDC-like kinase 1 [Homo sapiens] /org=Homo 
sapiens /taxon=9606 /dataset=nraa /length=484 
Length = 484 

Score = 738 bits (1884), Expect = 0.0 

Identities = 352/429 (82%), Positives = 382/429 (88%), Gaps = 2/429 (0%) 
Frame = +3 

Query: 84 SHYLEARSLNERDYRDRRYVDEYRNDYCEGYVPRHYHRDIESGYRIHCSKSSVRSRRSSP 263 

SHYLE+RS+NE+DY RRY+DEYRNDY +G P H RD ES Y+ H SKSS RS RSS 
Sbjct: 54 SHYLESRSINEKDYHSRRYIDEYRNDYTQGCEPGHRQRDHESRYQNHSSKSSGRSGRSSY 113 

Query: 264 KRK-RNRHCSSHQ-SRSKSHRRKRSRSIEDDEEGHLICQSGDVLRARYEIVDTLGEGAFG 437 

K K R H +SH+ S KSHRRKR+RS+EDDEEGHLICQSGDVL ARYEIVDTLGEGAFG 
Sbjct: 114 KSKHRIHHSTSHRRSHGKSHRRKRTRSVEDDEEGHLICQSGDVLSARYEIVDTLGEGAFG 173 

Query: 438 KWECIDHGMDGMHVAVKIVKNVGRYREAARSEIQVLEHLNSTDPNSVFRCVQMLEWFDH 617 

KWECIDH G HVAVKIVKNV RY EAARSEIQVLEHLN+TDPNS FRCVQMLEWF+H 
Sbjct: 174 KWECIDHKAGGRHVAVKIVKNVDRYCEAARSEIQVLEHLNTTDPNSTFRCVQMLEWFEH 233 

Query: 618 HGHVCIVFELLGLSTYDFIKENSFLPFQIDHIRQMAYQICQSINFLHHNKLTHTDLKPEN 797 

HGH+CIVFELLGLSTYDFIKEN FLPF++DHIR+MAYQIC+S+NFLH NKLTHTDLKPEN 
Sbjct: 234 HGHICIVFELLGLSTYDFIKENGFLPFRLDHIRKMAYQICKSVNFLHSNKLTHTDLKPEN 293 

Query: 798 ILFVKSDYWKYNSKMKRDERTLKNTDIKWDFGSATYDDEHHSTLVSTRHYRAPEVILA 977 

ILFV+SDY YN K+KRDERTL N DIKWDFGSATYDDEHHSTLVSTRHYRAPEVILA 
Sbjct: 294 ILFVQSDYTEAYNPKIKRDERTLINPDIKWDFGSATYDDEHHSTLVSTRHYRAPEVILA 353 

Query: 978 LGWSQPCDVWSIGCILIEYYLGFTVFQTHDSKEHLAMMERILGPIPQHMIQKTRKRKYFH 1157 

LGWSQPCDVWSIGCI LI EYYLG FTVF THDSKEHLAMMERILGP+P+HMIQKTRKRKYFH 
Sbjct: 354 LGWSQPCDVWSIGCILIEYYLGFTVFPTHDSKEHLAMMERILGPLPKHMIQKTRKRKYFH 413 

Query: 1158 HNQLDWDEHSSAGRWRRRCKPLKEFMLCHDEEHEKLFDLVRRMLEYDPTQRITLDEALQ 1337 

H++LDWDEHSSAGRYV R CKPLKEFML D EHE+LFDL+++MLEYDP +RITL EAL+ 
Sbjct: 414 HDRLDWDEHSSAGRYVSRACKPLKEFMLSQDVEHERLFDLIQKMLEYDPAKRITLREALK 473 

Query: 1338 HPFFDLLKK 1364 

HPFFDLLKK 

Sbjct: 474 HPFFDLLKK 482 (SEQ ID NO: 5) 
Hmmer search results (Pfam) : 

Scores for sequence family classification (score includes all domains): 



Model 


Description 


Score 


E-value 


N 


PF00069 


Eukaryotic protein kinase domain 


272.4 


5.9e-78 


1 


CE00022 


CE00022 MAGUK_subfami1y_d 


26.7 


8.6e-08 


2 


CE00204 


CE00204 FIBROBLAST_GROWTH_RECEPTOR 


3.4 


2.3 


1 


PF00548 


3C cysteine protease (picornain 3C) 


1.6 


7.7 


1 


CE00031 


CE00031 VEGFR 


0.7 


2.5 


1 


CE00289 


CE00289 PTK_PDGF_receptor 


-49.9 


0.0045 


1 


CE00292 


CE00292 PTK_membrane_span 


-102.3 


0.0063 


1 


CE00287 


CE00287 PTK_Eph_orphan_receptor 


-117.7 


0.97 


1 


CE00291 


CE00291 PTK_fgf_receptor 


-138.4 


0.73 


1 


CE00290 


CE00290 PTK_Trk_fami ly 


-173.0 


0.0023 


1 


CE00016 


CE00016 GSK_g1 ycoqen_synthase_ki nase 


-239.0 


0.0019 


1 


CE00288 


CE00288 PTK_lnsulin_receptor 


-240.3 


2.7 


1 
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Parsed for domains: 

Model Domain seq-f seq-t 



CEUUZU4 


1 /I 
1/1 


1 9R 


1 

Ijo 


CEUUUdI 


1/1 


1ZU 


1/D 




1/1 


120 


223 


CE00022 


1/2 


306 


331 


CE00288 


1/1 


125 


353 


CE00291 


1/1 


123 


368 


PF00548 


1/1 


370 


378 


CE00287 


1/1 


123 


379 


CE00290 


1/1 


124 


379 


CE00292 


1/1 


123 


381 


CE00022 


2/2 


414 


437 


PF00069 


1/1 


123 


439 


CE00016 


1/1 


66 


445 



hmm-f hmm-t 



bib 








073 
Of S 


Q3/I 








J.V/-7 


[] 


191 


216 






1 


269 


[] 


1 


285 


[] 


175 


183 






1 


260 






1 


282 






1 


288 






258 


281 






1 


278 


[] 


1 


433 


[] 



score E-value 

3.4 2.3 

0.7 2.5 

-49.9 0.0045 

4.9 0.23 

-240.3 2.7 

-138.4 0.73 

1.6 7.7 

-117.7 0.97 

-173.0 0.0023 

-102.3 0.0063 

21.6 2.8e-06 

272.4 5.9e-78 

-239.0 0.0019 
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1 GCAGAAAAGT ATAAAGATGG TAATCTCTGT AGGAAATTAG TCCCCATTAT 
51 TTAGCTGTAA AATTATAATT AAAAAAAAAA ATCTTTGTTT CTAAATCTTT 
101 GCCACTGATT ATTTCCTGAA AATACACTCC AGGAAGAAGC A I I I I I AAGT 
151 TAAAGCATGT GAACTCTTAT TTCTTGCTAC AGGTTCATAT TTC I I I I I CT 
201 AGAGAGTTTG CCAAATTATA CAACGTGCTC CTTCATGCTC TCACCAATCT 
251 TGGCTGTTTT GAAAGGCCAA GAATAATGTT TTGATTAAAC TGAAI I I I IA 
301 AATTTCTAAC GAATTTGTCC GCTGTCATAT ATTTATTGAT CATTTGAACA 
351 TCI II I I ATT CTTAGCCTAT TTATTAAAGT A 1 I I I I ATTG ATTTAGAAGA 
401 GCI I I I I ATT ACAATATTTT AACCATTTGT CATATATATA TTGCATAGTG 
451 TCTTTTCTTT ATGATTTGTC TTTTGGAGGT AGCCTGTGAA TTGGTCTCCC 
501 TTTCTACAGG CTTAGTTAAT CCATTCTGCA TTAGAAAGAC TGATGTGGCT 
551 GTAAACCCTA CCTTTATATA TTGTGGTCAG AAGCCTGTAA CATAAAGTAT 
601 CAAGTCTTAA ACCAGTGATT CTCCAACTTT AGTGTGAATA AGAATCACCT 
651 TGGAGGTATG CTGACCAGAT TTACAGTCAG TGAGTATGAC CTAAGGCCCA 
701 GGGTTACCAT TTTTAATAAG AACTCCATAT TTGATACTGT TGATAAATAG 
751 ACCGTCCTTT GAGAAATAAT ACTCTTTAGC CTAGCACGCA GGGI I I I IAA 
801 TGATGCTATT CTCAGCTTAC TTATTTGTCT ACATTCCCCT ATGTGAAAAT 
851 TGCTCTTGCT GGGATTGTCT TTTTCCTGAG TAATGCATAG ACAATTCCAT 
901 CTCTAAGCCA TTGTGGCTAA AAGTGCCATA TGAATTTAAG ATGGTAATAT 
951 GCCATTCTTC TCCCCCGGAA TTTCTTCTGT ATTCTACTTT TTCCAAATCC 
1001 TGGCTTCCCT TTAAGATGCA ACTCTATTTC CATC I I I I I I GTAATTATTC 
1051 TCTGACCATT TTAAACAGAT I I I I I CCCCC ATCTCTGACT CTAAGCACTC 
1101 ATGTGTTGTA ACCTTTTAGA ATTTCCTACA TTGTTGGATT TTGTTTCATT 
1151 TTTATGTGAG TAATCTCAAA TTGTTCATTA TTTGTTGGCA GGGACTTTGC 
1201 CTTATATAAT I II I I I I I IA TCTCCCACAG GACCTGTGTG GATATAAAAA 
1251 CGAATGCCCT TACCCTCATC CGTCTTGGCT ATTTGAAAGG CTATAGTGAA 
1301 ATATTCACTG GGCATTCAGT GGATATTTTA AAAAATTAAA TCAGTCTGTT 
1351 CATCCTGTCC ATAGCCTGTG TAATTCTGTA GACTTTGTTT ATATAATCTC 
1401 TCAGCCTTGG TCATTGGCCA TTATCTATTG AAGAGACTCT CATCCTTTTA 
1451 GTTTGTCCTC ATGGTGTTCA CTCCCATGTT TTGTTACTCT ATACGTTGTT 
1501 TATGGCTTAG CAGCTCTAAT TCCATGCAGT ATTCCAGCTA AAGATTGTTA 
1551 GTGCTAGTTT TTTCTAATAG AAGGATTTTG GACTTTTATG GGAAGGATGC 
1601 CCTTAAGAGT ATGGTCACGT CTAGCTTATT GTATTGGTGA TCTCTCCCTG 
1651 ACAGTTCCAA GCCAACTGAT CAGATCTCTA ACCTAGACTA CCCACAGTCT 
1701 TACCCAAATA TCCTGAGTTG TTTCTCCAAT AAAATACAAC TTAAAGCTGA 
17 5 i TGCTAGGGAA AGAGAACCGG GTTTCTGTAT CTCCCCAGCC TGGATTTGAT 
1801 GCTAGCCCTA TTGGGTAGTA GTTGTAAAGA TGCTTCTATT TCTGCCTAAA 
1851 CCAGCCCCCT GGGAAAAAGA ATGACAGCAT ATTCTGGGGA AAGGAAAGGG 
1901 GTTGGTGAGG GCAATCTAGT CAACATCCGT CACTCCATTG CTTGTTAGGC 
1951 TTATTTTAGC CGATGTGTCT GACTGGGCAG GTGTCCCCTC TCTCCCTCAG 
2001 TGCTCCATGT GCATCCCTCT TGAAGCTTCG CACTCTGTTG AAGAGGACAC 
2051 TCATCCCAGG TAGAGAGGGG GACGGGAAAC TGGGCCAATT GAATCTATGT 
2101 CCTTTTCTTT CCATCAGATC AAGGCCACTT AACTGGGATC CATTGACATC 
2151 CTGAGGCCCA TGACCTTTGA AATTCCTTGC CAAGTTTTGT TTATGTGTTT 
2201 CTTAGGAAAG AGAGTCCATG GCTTTCAGCA GATTTTCAAA GGGATCTCTA 
2251 GATTAAAGCA CGATGGCACT AGATGATGGT GTTTTCTGTT GTTTCTTAGG 
2301 TATTTCTCAA ACAGGAATGA CAGGAAATTA GAAATGCAAA GGGAAGTAGG 
2351 GTGGTGGAAC TATTGTAATG CTAAACTACA GGATCCCTTT CTTATTTTAG 
2401 GGGGATATAT TTTAGATGCC TTTGGCACAT GAGGCAGTCC TCAAAAGCTA 
2451 TGTTTTCTAT TTCTCAAACA GGAATAACAA GGCTAGAAAT GCAAAGAGTA 
2501 GAGGAGACAT GATAGATGCT GTGTGTAATA AAATTGGCCT GTATAATAGT 
2551 GGTTTGAAAA TATTTTAGTT TTTGTCACTA ATGTTGTTAT ACAACCTTGG 
2601 TAAATCATTT TTCTTCTAGG GATCTTAATG TAGTCGTCGG TAAAATGAAA 
2651 GGGCTGGAAT ACATTTAAGG CTCCTTATAG CTCTAATATA CCTTTCATGA 
2701 AGGAATTCTC TCTGTGCCAG GGATATCTAA AATGCTCTTA CATTACAAGA 
2751 GAAAGGAATC C II I I I GCCT GCCTCTGATT GTACCTCTGT GAGAGACTAA 
2801 GACAGCTTAG ATACAGGTGC AGAAGGTAAA GGAACACTTA ATCAAGTAAA 
2851 CACTAGACAT GAATTAATGA TTTGACTCAA GCTTTATTCC TTGGTGTGAA 
2901 GTGCTTGACA GCAAACTCTA TAATGGGCCC ATTTGCTTGT TTGTTAAAGT 
2951 AAAATTATTT CTTAAGCTTT ATGAGATAAA TATAAATGCT AATTCATCTG 
3001 TTTGAATTTT TTTCTTATAT TGAGTTAGCT GTTTAAGAAT TTCTGAGAAA 
3051 ATGTTTTGTT TGAACCACAT TATTGCAGAA TGAAGAGAAT AATTTGAAAT 
3101 CTTTTAATGT GTTTGCAGTC ATTATTTAGA AGCAAGGTCC TTGAATGAGC 
3151 GAGATTATCG GGACCGGAGA TACGTTGACG AATACAGGAA TGACTACTGT 
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3201 GAAGGATATG TTCCTAGACA 
3251 AATCCACTGC AGTAAATCTT 
3301 GGAAGCGCAA TAGACACTGT 
3351 GTTTTGTTTT CAATTTGAGT 
3401 CTGATAAGTT TCTAAI I I I I 
3451 TATATTATAA TTGTATTTAT 
3501 ATTCTTGTAG CTGATCTGTA 
3551 TTTCCTCATA TTCTGCAGAC 
3601 AAACTATTTT TTTATTTGTA 
3651 TATCTTTTCT TTCATCTTAA 
3701 TGGTACTTGA AAATAAATGT 
3751 TACTAACCTG CATAGAGCAT 
3801 AATGAATTCT TTTGAAGTTG 
3851 AAAAGTCTGC CCCCCCCTTT 
3901 CTTGAACAGT GAACTTTGAA 
3951 ACTAG I I I I I ATTAGATTGA 
4001 CI I I IATTTT AATCATGTAT 
4051 TCAAATAGTC TTGACATTTT 
4101 AATTCACAGC ATATGCTGTT 
4151 GCATTCTTTA AATTTCAGAC 
4201 TTGTGTTGTT I I I 1 I CTATT 
4251 TTCCTTGCCA TG I I I I I CTT 
4301 TGGGGACTGA ATTTGAAATT 
4351 TATCTGAATG CCTGAATATC 
4401 TGAATTCACT CTGATATATT 
4451 TTGCCGTTCC AGAAGAGCCA 
4501 TGATGAGGAG GGTCACCTGA 
4551 GATGTATAGA ATA I I I I I CA 
4601 CI I I I IAAGA ATAGTTTGTC 
4651 TTTTATTTTG C I I I I I GTGG 
4701 CTGTAGAATT TAAATATTTC 
4751 TTTGAGATTA GAGCAAGAAA 
4801 CTGAAACTAA AATAATTTGA 
4851 GATCATTTGA AAATATTCCA 
4901 ACAGTTTGTA CTCAGGTACT 
4951 GAAATCGTGG ACACTTTGGG 
5001 CATTGATCAT GGCATGTAAG 
5051 Mill GGTGG GGAAAGATTC 
5101 TATTTGAGAT AGGGCCTCTG 
5151 TTGGCTCACT GCAACTGCCG 
5201 AGCCTCTCAA GTAGCTGGGA 
5251 I II I I GTATT TTTAATAGAG 
5301 CTCGAACTCC TGACCTCAAG 
5351 TGGGATTACA AGCCTGAGCC 
5401 AAATGGTAGC CACGTGTTTT 
5451 AACTTTGTAT GATTTATTTA 
5501 GCCGGGCACA GTGGCTCACT 
5551 GCAGGTGGAT CACCTGAGGT 
5601 GAGAAATACA TTCTCTACTA 
5651 CATGCCTGTA ATCCCAGTTG 
5701 AACCCAGGAG GGGAGGAAGG 
5751 TTGCACTCCA GCCTAGGCAA 
5801 AATAAAAAAA AAGAATGATG 
5851 TTAAAGTGGA TGTTCAAGTG 
5901 CCCTTGACCC TGTATATAAG 
5951 AAGCATGTCT AATCTTAAAT 
6001 CATTTTATTA TCTAGGGCCT 
6051 TGCCTCATTT TACCGTTTTC 
6101 CTTTTGTAAG AGAAGCTCTT 
6151 ATGACAAAAT I I I I I ATCAT 
6201 CTGTTGTAAA TATTAATATT 
6251 ATTTCAGGGA ATCTAAATAC 
6301 AAACAAACTC TTCTTGAATA 
6351 TCATTTCACT TTCCATAAAT 
6401 TTCTGCAACA AATATGTGAA 



TTATCACAGA GACATTGAAA GCGGGTATCG 
CAGTCCGCAG CAGGAGAAGC AGTCCTAAAA 
TCAAGTCATC AGTCACGTTC GGTATGATTG 
GGAGTTTTAT TTGTGTGTAC TCTTAACGAG 
TATATATATA TATATAAAAT ACTATTTGGA 
ATTACTTAAA TCCTTAAAGG AAACCTCCAA 
TATTTATTAG CTAGCCCTCA TTTGCCCACA 
CAGATAATGA GTTTATTGAT TTTAATAATA 
ACATATTCTT ATGAAAAAAT CATGCACCCA 
GCA I I I I I I I TTTCTTAGAA ACCCTTTATC 
GAAATATTGC ACTGGTGGAC ACCTGAATGT 
AGTTCCATAG TCCAGTGCAT CATTGTCTGC 
TGAAAATGGG TGCTGAATGG GAAACATCCA 
I I I I I I I IAA CACTCAGACA TCTTCACCTG 
TTAGTTTCTC CCCAAGTTTT CTTCAGTAAA 
ACATTGAAAT TAACTAGCCT TTATTTTCCC 
ATTTTAAAAT ATTGCTAAAT TAGAATAATT 
AAAACATTTT TCTGAAAAAC TAGACATCTC 
TATAGCAAGA GATAAGTAAA TCATGACATT 
TTCAATTAAA TCAGTATTTT AAAGAGACAA 
GCCACTTTAA GTATCTTATC TGAAAATCTG 
CTGTAACATA AACTGTGCCC TGTGAATTTC 
GCTCCTGCCA ACTGTTCGTG GCCTGGTGCT 
TCCCCGCTGA ATGAATTGCG TATTCTGCCC 
GATTGGCTGG ACGATCTTGG TGCTGCCCAC 
CCGAAGGAAA AGATCCAGGA GTATAGAGGA 
TCTGTCAAAG TGGAGACGTT CTAAGAGCAA 
ACAC I I I I I A AACTTTGCAG AAAGAATAAT 
AGCGGGGGGC TAAAGAACTC TTCATTGCTT 
GTTTGTTTGT TCTTTTATAT TTCTTCTTTT 
TATTCTAAAG TTCCAAAATA ATCAGTGGAA 
GATAGCTCTA TCTAATTGTT TTTGTAGCAG 
GTGCTGAAAC CTTAGTTATG CTTTGTTAGA 
CACTTAAGCA TTCATTGTTT GAAGAACTAG 
TACACCTCTT TTTCCCTCCT CACTCTAGAT 
TGAAGGAGCC TTTGGCAAAG TTGTAGAGTG 
TTTG I I I I I I CC I I I I CAAA CATTCTGATG 
ATAATTCAGA TGAAATTTTA TTTATTTATT 
TTGCCCAGGC TTGAGTGCAG TGGTGCTATC 
CCTCCCGGCT TCAAGTGATT CTCCTGCTTC 
TTACAGGAGC CTGCCACCAC ACCTAGCTAG 
ATGGGGTTTC ACCGTGTTGG CCTGGGTGGT 
TGATCTACCC GCCTCAGTTT CCCAAAACGT 
CCTGTGCCCG GCCAAGATGG AATATATTTT 
GGGGGGTAAA TTACTCACCA AAG I I I C I I G 
CCGTGAATGT GGATCTTAAG AATGCTGACT 
CCTGTAATCG CAGCACTTTG GGAGGCCAAG 
TGGGAGTTCA AGACTAGCCT GACCAACATG 
AAAATACAAA ATTAGCCAGG TGTGGTGGCA 
CTTGGGAGGC TGAGGCAGGA GAATCACTTG 
CGGAGGTTGC GGTGAGCCAA GATTGTGCCA 
CGAGTGAAAA TCCGTCTCAA AAAAAATAAA 
ACAAATTTCA ACAGGGGGAA ATCATTGAAA 
AAGGAATTTC CCAGAACTCC AGAACTGAGG 
ATTTGGCAAT TTCGGATTAC AGAGGCAATA 
GTTAAGAGTT AGCTTCCTAA ACTATAAAGA 
AGAGAATAAA GTTTGTGATT TGACCCTTTC 
CTCTAGGACC TCTATTTTGT GGCTTGAAAA 
AGAACTTTTG CGAAACTTCA CATTTCTAAA 
AAATTATTTG GGAAGGATGT AATTTCCAAC 
AAAAAATAAA ACTTACCTCT CTCTAAATGC 
CATAGCAGCT TGATACCTAC CATCATCCAT 
CTTAGAAATG TTTTATTATT GAATTTATTG 
ACTATCCTAA ATTATCCCCA CATTTTGCTT 
TGTAAATTGA ACTTTAAAGT ATTTTGAAAT 
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6451 ATTTTCAGAC TTACAGAAAA ATTGATAAAA TAGTTCAAAG AATTCCCATA 
6501 TATTCCAAAT GTTAACCTAT TTTCCAAATG TTTACATTTT ATAAGATTTG 
6551 CTTTATCATT ATACATACAT TTGTTTTCAA ATTTTGCCAA CTAATCTGCA 
6601 GACTTTATTC AGATTTCACC AGTCATCCCA TTAATGTCCT TTTAGAATTT 
6651 CTTGAAAGTC TAAGTCTTGG TGTATTTAAT GAAATGTATC TTAAAACAAA 
6701 1 I I I I I I I IA ATGAGATGGA GTCTCACTGT GTTGCTCTGG CTGGTGTGGA 
6751 ACTCCTGGCC TCAAGTGATC CTTCTGCCTC AGCCTCCCAT AGTGCTGGGA 
6801 TTACAGGGTG TGAGCCCTGT AGTCACGTGT GGCACACACC TGTACCACAT 
6851 CTGGCCTGGA ATGTTTTCTT TATTGGGGCA GTTGAGGCCT CTAAAAAATG 
6901 AGTACATATA GCCATAGATA AATATCTGAC TGTCTAGCAT TGTATGTTTT 
6951 C I I I I I I CAT TTTCGTGGAT ACAAGCACTG AGAAAACTTT TTGGTCATAT 
7001 AATTAAATAG ATAGGAGTAG AAGCTTTGTC ACAGTAATCT TATTAGAGTT 
7051 CTTTTAAGTC TTGAGGTATA TGCCAAGCAT TAAAAAATTT TTTTAGTGAC 
7101 TTATCAGTTC ACATTCGTTG GGGCCTTGTT GAAAGCAATG AACTGGAAAC 
7151 CACTGGATGT GGAAAAAGGT TTTGTATCCA GCCATTAGAA TACGTGTTTG 
7201 TTTGCCCCAA ATGI I I I IAT AGCCTAGGGC ATACATCCTG TTACACTAGT 
7251 AAGAGATGGG TATGGTTTTG TAAAGTGGAA GGGTCATAGT GAAAAAGAAG 
7301 GCTTGAATGC TGGCTCATCT GTAGGTAGAT TAGGTTTAAA AAGGAAGACA 
7351 AAAATAAATT GAAGATTTGC AACATTTATG GCTCTATACT TTTTAGGAAG 
7401 CATTCTTACA GATGCCGCAG TCTAAAGCCC ACTGCCCTCC CCTGTAGCTG 
7451 TTTCTGTATA CTGGCATCAG TGCATCTGCT AAGG I I I I I C TGGGCTTCAT 
7501 TACTTAGAGT TGGGGTCTCC TTTACCTGGA TGTTTCCTTC CCAATCTGAC 
7551 AAACTCCCAG CTATCTTTCA GGACTCAGTT CTGTGTCACC TCTTCTGTGA 
7601 AGAAGTCTAA GTTGTTTCTG TGTCTGTCTT TTCCATTAGA CTTTGAAGTA 
7651 CGTAGGGACA CACCCCGTCT TTTAATCACT AATATCTGTG CATTGCCTGG 
7701 CACAGAGTAG GCCTAGCCTG GTAAATGAAT GAATGCTTTC AACAGTAGCA 
7751 TATCCTATTT TTGGTTTACA TTTGTATATA TCTTTTAAAA GTGTTGTTGT 
7801 ATAAAATGTA ATTAAATTTA AAATTCTAGG AGCAAACGTT AAAACTCATA 
7851 AGTATTAAGG GAATTATCAC TTGATATAAA GTATTTTATC AAAATGTTTT 
7901 AAGAAGATGT TATATGGAAT CTGCTATAAT ATGTTCTGAA AGATTATTTT 
7951 AAATGGCATA GAGGAATTGG TAATTAAGAT TATGCTTTAG AG CAT AA CAT 
8001 GGCTTCAGCT CACTCTTGTA CATTTATCAT TTTTATCTTA ATTTTATTTT 
8051 TAAGGGATGG CATGCATGTA GCAGTGAAAA TCGTAAAAAA TGTAGGCCGT 
8101 TACCGTGAAG CAGCTCGTTC AGAAATCCAA GTATTAGAGC ACTTAAATAG 
8151 TACTGATCCC AATAGTGTCT TGTAAGTATA ACTTTCACCT AGGAGCCATC 
8201 ATATTACATG AAATATTCAG GTTTCCATAA ACTGAATTAT TATTTTGCTC 
8251 TGTTTTAGCC GATGTGTCCA GATGCTAGAA TGGTTTGATC ATCATGGTCA 
8301 TGTTTGTATT GTGTTTGAAC TACTGGGACT TAGTACTTAC GATTTCATTA 
8351 AAGAAAACAG CTTTCTGCCA TTTCAAATTG ACCACATCAG GCAGATGGCG 
8401 TATCAGATCT GCCAGTCAAT AAATTGTAAG TACACTTGAT AAATCTTTAT 
8451 TTTTATTTAT TTATTTATTT ATTTATTTTG AGACGGAGTC TCGCTCTGTC 
8501 ACCCAGGCTG GAGTGCAGTG GCGCTCTCGG GTCCCAGCAA GCTCAGCCTC 
8551 CCGGGTTCAC GCCATTTTCC CGCCTCAGCC TCCCGAGTAG CTGGGACTAC 
8601 AGGCGCCCAC CACCATGCCC AGCTAATTTT TTGTAI I I I I AGTAGAGATG 
8651 GGATTTCACA GTGTTAGCCA GGATGGTCTC GATCTCCTGA CCTTGTGATT 
8701 GCCCCCCTCG GCCTCCCAAA GTGCTGGGGT TATAGGCGTG AGCCACTGTG 
8751 CACAGCAATA AATCTTTATT TTTAAATATT TTTTATGTTT GTACCTCCTT 
8801 AACAATTAAG ATAAATCTTT AAGCACCAGA AAACTTGTTT TT ATT AT AC A 
8851 AGCTATATAT CCAAATGTTG TCACTAAAAA AACAGACATT TTACAAGTAA 
8901 AGATGAATCG TCTCTTGACC ACTATATCCT TTGCCAGTCC TCCTTTCCCT 
8951 CCTAGTACAA ATTAAGTTTG TAAGTGAAAC TAATAATGTG CTTTTGTTCT 
9001 CTTGTAGTTT TACATCATAA TAAATTAACC CATACAGATC TGAAGCCTGA 
9051 AAATATTTTG TTTGTGAAGT CTGACTATGT AGTCAAATAT AATTCTAAAA 
9101 TGGTAAGTTA AAGACTTGTT TTAATTTGGG TGGTTGTCTT TAAAATTAAT 
9151 TTAACTTGAT GATCTTTGGA TGAGGAATTT CACTTCTGAG CCTTATTATA 
9201 TCCTGTTGTT TAACCAAAAA GAAGTAATCC TTCTTTGCCT TTCTCATGAG 
9251 CTTACTTTGA CAATCAAGAA GATAATTCAT GTGCTGGCCT TTTGAGTAGC 
9301 GCTATAAAAT GTATCTATTG AGTTTCATGT TTACTCAACT GTGTCTCTCT 
9351 AGAAACGTGA TGAACGCACA CTGAAAAACA CAGATATCAA AGTTGTTGAC 
9401 TTTGGAAGTG CAACGTATGA TGATGAACAT CACAGTACTT TGGTGTCTAC 
9451 CCGGCACTAC AGAGCTCCCG AGGTCATTTT GGGTCAGTAG ACACCAGGCT 
9501 TTCTAATATT ATAATTGAAG AAGAGATTTT TGTTCTTTAC AGCTTTACTG 
9551 GTGGGGTGGG GAAGTATGAT CTTCTCAGCA GGATTCAGAA AACGTTTTCT 
9601 ATTTTCATAA AAAATGTGTG GACATTGCTA TAAATACTTT TCCTGAGTGG 
9651 TAAACATGTG ATACTGTCTG GGAAAGATAT TCCAGGTGGT GGTTATTTTT 
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9701 GAACAAGTAA ATCTTAAATG ATCATAAGAG AACAGGCTGT GTTAGCTAAA 
9751 TGCATCAAAG AAATGTGATT TTGAAGTTAT ATGAGTACCT ATTTTCATGC 
9801 CATCACAAAA GCACATGGCT GGTAAAAATA CTGAGGAAAC TGGTTGGCAG 
9851 ATGTCTAGAA TATAGGATGG ATAAAGGTCA AGAGAAGAAA GAGGCTTCTC 
9901 TAAGAGCTCC TGTGATAACC CTTGATGTGA GAAAGTCTGG GAAAGAAAAT 
9951 GAGTTAAGGT GCAGAGTTTT CAAATAAGAA GGGACTTATT AAGGGAGTGT 
10001 TATGCCTCAA CATTAAAAGT TATAGATCAG GTGTGTTAAT AAATCAGGGA 
10051 AGTCAGAGAT TGGCTTGGGA GCTTGGAGAC ATTGGGAAAC ATTCAGATCA 
10101 GGCATATCAA GAGAGTTGAA TGTAATAAGC TGATTACTTA GCCTAAAGTT 
10151 AGGTCCAACT GAGGTTAGAT TGTAAAGCAT TTTTGTGGAA TCGTATTTTA 
10201 ATACI I I I IA C I I I I I I I GT GTGTCCAACG GGACTTGGTA GTTCAGAATA 
10251 GGAGTGTAAA AGCAAACTCT TGATACTTAC CTAGAGTAGA GTAGTAAAGG 
10301 AGTGAGGAAA TCAAGAATCC TGTGCAGCTC TTGCCCACAG AACTTCCCTT 
10351 GATGACAGAA ATGTTCCATT TCTGCACTGT CCCATATGGT AGCCACTAGT 
10401 CACTGTGCGT GACTGACTAC CTTGTAGTGG GGCCAGTGTG ACTGAGGAGA 
10451 ACTGAGTTTT GAATTTACAT TAATTTTATT TCAGATTTAA ACAGCCACAT 
10501 GTGGCTAGTG GTTACCATAT TGAACAAGCA CAACTCTTAG AGCTTGTCTT 
10551 TTAAATGCGT AATAATAGGG TTTCTGCGTA GTACAAATTG AAAGGAGCTA 
10601 CTGTGTAAGG GTAAAAGAAA GCAATATGGG AAGAGATAGT GGACAGAGAG 
10651 GTATTTTCAG AGATTAGAAG GCAATAGATT CCTCATTTTA AGAATCAGAT 
10701 TTTTCCCCAA ATATTTGGCA I I I I I I CTTT GTTATTGGTA TATCAAACAG 
10751 TGGTGCATCG TACAGTGTGC TATCCTAGAT TGAGTAAAAT ATAGTATATA 
10801 GTAACCCCCC CCI I I I I I I I TTCTTTGAGA TGGAGTTTCA CTTTGTCACC 
10851 CAGGCTGGAG TGCAGTGGTA CCATCTCGGC TCACTGCAAC CTCCACCTCC 
10901 CAGGTTCGCG CGATTCTCCT AACTCAGCCT CCTGAGTAGC TGGGATTACA 
10951 GGTGCCCACC ACCACACCCG GCTAA \ I I I I ATAG1 I I I IA GTAGAGATGG 
11001 GGTTTCACCA TGTTAGCCAG GCTGGTCTCG AACTCCTGAC CTCAGGTGAT 
11051 CCTCCTGCCT CGGCCTCCCA AAGTGCTTGG ATTACAGGCG TGAGCCACCG 
11101 CGCCCGGCCA AGGAI I I I I I I I I I I IAATT TTTATGTTTT TTATAACAGA 
11151 GACAGGGCCT CACCATGTTG CACAGGCTGG TCTCGAACTC CTGGGCTTAA 
11201 GTGATCCGCC TGCCTTGGCC TCCCAAAGTG CTGGGATTAT AGGTGTGAGC 
11251 CACCGCACCC ACCAGAATAT GGTCAATCTT ATTAATAAAG TTCCAAATGT 
11301 GGCCAAGCAA GGGATAGTAC AAATCTGAAA TTGGAGTCCC TGGCCTTGAG 
11351 GAGAAAGAAT CAGGAGATTG GGAGAATAGA AAGGTCCTTT GTTTGTGGAG 
11401 TGAGGATGAA GGCATAATGC AATTGGAGGG GAAAATGTAG TCAGGTGCTA 
11451 GAGTTGAAGT AGGCAGTTGG CCTTATGTTG GGTATAAAAG CTAACTCATC 
11501 CAAGAATGAG ATGATTTAGA ATGGTGTACT GCAGAAGATT ACAGTCACCT 
11551 GGGAAAAGAC TAAATTGGGA GATAGGAGTG GTTGAAAAAT AAAACI I I I I 
11601 I I I I I I I I 1G AGACGCAGTC TTGCACTGTC ACCCGGGCTG GACTGCAGTG 
11651 GCACGATCTC GGCTCACTGC AACTTCTGCC TCCTGGGTTC AAGCGATTCT 
11701 CCTGTGTCAG CCTCCCAAGT AGCTGGGCTT ACAGGTGCCC GCCACCACGC 
11751 CCAGCTAATT TTTTGTATTT TTAGTAGAGA TGGGGTTTCA CCACATTGGC 
11801 CAGGCTGGTC TCCAACTCCT GACCTTGTGA TTCACCTGCC TTGGCCTCCC 
11851 AAAGTGCTGG GATTACAGGT GTGAGCCACC GTGCCTGGTT GAAAAATAAA 
11901 ACTTTTATGA GGTCCAAGCT CTAGCATTTA CGGATTTTGT ATGTGTTAAT 
11951 AGGTAGAAAC CATGCTCCAT TATTTATTTA TTTATTTTTT GAGACAGAGT 
12001 CTCACTCTGT TGCCTGGCCT GGAGTGCAGT GGTGCAATCT CAGCTCACTG 
12051 CAACCTCTGC CTCCCGGGTT CAAGCGATTC TCCTGCCTCA GCCTCCTGAG 
12101 TAGCTGGGAT TACAAGTGCA CACCACCACA CCCAACTAAT TTATATATAT 
12151 ATATATATAT ATATTTTAAA ATTTTTATTT TTTATTTTTG TTATTTGTTT 
12201 ATTTA I I I I I TTGAGATGGA GTTTTGCTTT TATTGCCCAG GCTAGAGTGC 
12251 AGTGGCGCAA TCTCAGCTTA CTGCAACCTC TGCCTTCCGG TTTCAAGCCA 
12301 TTCTCCTGCC TCAGCCTCCC AAGTCACTGG GATTACAGGC GTCTGCCACC 
12351 ACGCCCAGCT AATTTTTTTG TAI I I I IAGT AGAGACGGGG TTTCACCATG 
12401 TTGGTCAGAC TGGTCTCGAA CTGCCAACCT GGTGATCCAC CCGCCTCGGC 
12451 CTCCCAAAGT GCTGGGATTA CAGGCATGAG CCACCGCGCC TGGCCCATGC 
12501 TCTATTATTA TCCATTTGTT CAAATGACAG ACACTGGAGC GGATGGTTAA 
12551 CAAAAATGAC TTAAGTCATT ATATATTGAC TTGAATATAT TTCTTCTTTT 
12601 ATCTTTAACT TCAGTGATAA TGAAAGTAAT TGAAATGTCT TTGAATGTAG 
12651 ATTTTATTTA TACAI I I I I I AACTAAATAT TTGATCTTTG AAATATTAAA 
12701 ATATCTATGT GGTTGGTTCT TTCTCCTTCC CAGTCAGTAT AGATTTAAGA 
12751 AGGCTAGATG TTTTATTCTG ATCTGAATAA TACTGTCATT GAGAATTCTG 
12801 AAGGAGAAAG TATATAAAAT CATGTATAGA CAGCGCCGAT GTTTATGTAT 
12851 AGATCCCTCT CTGAGCTCCA ATGTGTCTGT AATTTCTGCT TATAGGTGAA 
12901 ACTGCTTAAA ATTCCCATTA TACCTTTTAT ACAATTTGTG CAAAACGGTA 
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12951 ATATTTCTCT TAACGGAAGA 
13001 TGATAAGGCA TTAGTAATTT 
13051 GTGCTAATAA AATATAAAAA 
13101 AATGGTTTTA ACAGCAGTGC 
13151 TGGAATATTT GCACGAATTG 
13201 TTATTTTACA TAACATGCAC 
13251 GTGAGCATAT GGACTATGGA 
13301 GAACTATAAA CTCTGAATTT 
13351 AAGGGAGCAC TCATATAGGG 
13401 TTTGCATAGT AAAATGTTTC 
13451 TCAGCCTTGT GATGTTTGGA 
13501 TTGGTTTCAC AGTCTTTCAG 
13551 TCATAACAAA TTGTAAACGT 
13601 TTGGAAAATT GCCATACATC 
13651 TGGTCATTCT TTAAAACCAC 
13701 ATCATAGTTA ATTTTGCATG 
13751 ATGGTTCATA CTGTCTCCTT 
13801 GCAGTAGTGT TAATGTGGCC 
13851 ATATGGCCAT GAACGAGTCA 
13901 CAACAAGGGA TTGTATTTTA 
13951 AGGCATTTCT AAGTATAAGT 
14001 AAGTCATGCA AAAGACATTG 
14051 CTGTGATGGA ACAACGTAGA 
14101 GCTCACACCT GTAATCCCAG 
14151 GAGGTCAGGA GATCGAGACC 
14201 TACTAAAAGT ATAAAAAATT 
14251 CCAGCTACTC GGGAGGCTGA 
14301 GAGCTTGCAG TGAGCGAGAT 
14351 GGTGAGACTC TGTTGCAAAA 
14401 GGTAGGGAGA ACCCAGGAAA 
14451 TGAGAACATA GGAAGAGGAG 
14501 ATATAAGGAA GAAGAAATAG 
14551 CATATTCTAC TCAGGGAAGA 
14601 GGAAAATTAT AACAATAGGC 
14651 TATTGGTTTG ATCCTAGAAG 
14701 AATGAAAATG GTTGTCTAGG 
14751 TCCCTCCTGC I I I I I I I I I ! 
14801 TAGTTGTTGG TAGTTTGCTT 
14851 CAAAAAGTAG GAAACTCTAC 
14901 TTACATCTGA AATCTTAATT 
14951 ACTGTGTCAC CCAGGCTGGA 
15001 CCTCTGCCTC CCGGGTTCAA 
15051 CTGGGATTAC AGGCACGAGC 
15101 AGTAGAGGGT TTTGCTGTGT 
15151 AAGTGATCTA CCCTCCTTGG 
15201 GCCACCGCAC CCAGCCTGAA 
15251 AACCATTGAA GAACTTTAAA 
15301 TGTTTTGTTT GGGGAAGAGG 
15351 TTGATTTGAA GTTAGAGCAG 
15401 GAGTCACTTA GGCAGCTGTT 
15451 ACCTCGCCAG ATTTCTGTTT 
15501 GTACTCTGAG TTTATAGATA 
15551 TCAGTTCTGG GAGGTATTGT 
15601 AGAAAAAGTT AAGCAAAGTG 
15651 TGGATGCTTC CGGTGATTTT 
15701 AAGTATATAT TATTTGAGAG 
15751 ATACTCTTAA ACTAAGGAAT 
15801 TTCCTGTCAT GTATCACAAC 
15851 TTGATAGATT TCTTTTAAAA 
15901 AAGGGTTAGC AAACTTTGGC 
15951 TTACAAAAAG AAGAAGAGTA 
16001 CCTAAAATAT TTACTATCTG 
16051 CTAGAAGCAT ACCATTCCAG 
16101 TTCAGCCAAG CCTCCGTTAC 
16151 TGATAGCTTA ACTTTCCTCA 



AGTAAACTCA TGCATCAAGC TGATGATAAT 
CATTCTGAGG ATAATTATAA ACCTGTATTT 
TTCTTGGACT AACCATGAAC TGAGCATAAT 
TCTCCCATTA TATAAACAGT TCAGAGACTA 
GTTGTATACT TGGAAAATGG TAGCCCCCTT 
CCCTCCCTAG TTAGAATACT GTGTCTTGAT 
GTGTGTTGAA TAGCATTTGC TGTAAAACTA 
GGTGTCTTAT TCTCCCAAAT GGGTTCTGTA 
AAGGATTTAA TGTACTGTCA ATTAAAAGTT 
TATTTGTTTT AAAATAGCTT TAGGTTGGTC 
GCATAGGTTG CATTCTTATT GAATATTACC 
GTACGTGGCT AGTAAATTCC ATTTAATAAT 
TAAAGGTATG CTAAAGTTTT GACTTCCATA 
ATTATTCTTG AGATTAAAAC TTAGGCAAAA 
AGTTGAATGA AATATTACTA TGAGTGAGTG 
TGATTAGTGT TTGTAACACA TGGTTCATAT 
TTTTAAATTG TAGAGCTTCT TCATAAATTT 
AGTTTTCAGT TATAGTTATG TTGACTATCA 
CTTATTCCTT TTTATAAAAG AATTCAGGAA 
CTCTTAAGTA TTAAGCATCT ATAATGTCTT 
ACATAAAGGT GAAGAGACAA CATCTTTCTC 
GAAAGTTATC GCAGTATAGT GTAGCATTTG 
AAGTGTAGGT AGGGAGGGCC AGGCGGGGTA 
CACTTTGGGA GGCTGAGGTG GGTGGATCAT 
ATCCTGGCTA ACATGGTGAA ACCCTGTCTC 
AGCTGGGCGT GGTGGCGGGC GCCTGTAGTC 
GGCAGGAGAA TGGCGTGAAC CTGGGAGGCG 
CATGCCACTG CACTCCAGCC TGGACAACAG 
AAAAAAAAAA AAAAAAAAAG ACAAAGTGTA 
GGTTAATAAT TACTTTAGAG AAGGCGTCAC 
GAGGAGTTAG AAAACTGGAG TGCAATGGGC 
TATCTGTAAA TGCACAGAGG AGTAAAGGAA 
ATAGCGTTGT CAGAGTGTCT TGTATAAATG 
AAGGATCAAT TCATAAAAGA CTTCGCAAGG 
TCAGTGGATT CCAAAAGTAG ACTGGTCCAA 
TTTGCCATTC TGACCCTTAT TTAGAGATTA 
TTTAATGTCT CTTTTATGTA ATGATAGTCA 
TTAAAAATAA AAAGTCCTTA ATTGGTAAAA 
TTTCTTTTCC ACTCTGTCCT TAAGTTGTAC 
I I I I I I I I I I TTTCCCTGAG ATGGAGTCTC 
GTGCAGTGGC GCAACGTCAG CTCACTGCAA 
GTGATTCTCA TGTCTCAGCC TCCCAAGTAG 
CACTACACCC CACTAATTTT TTGTAI I I I I 
TGACCAGGCT GGTCTCGAAC TCCTGACCTC 
CCTCCCAAAG TGCTGGGATT ACAGGTGTGA 
ATTTAAATTC TTGAAAGCTT TAGGTGATGC 
TAGGGTCATG GTATGATCGA GGTGTTGTGT 
GGCTGGAGAT CCCAGCTAGT ACTGTTGAGG 
TGCAGGGGGC ATGCAGCTAT GATGGGCTAA 
GCACAATGAT GAATTCCCTG TTCGTGGGGC 
CTGTCTAATC TGTAGAGATC CTGTTGAAAA 
AGTTTGATGT CTTAGAATCA TGGTTATTAA 
CTGGTTTTGC AGTGGTGAGC TGTAGGGTCA 
AATGCTTTCA TCAATCTGAC TAATATGAAA 
GTGATTATAA ATCACTTTGA GTTTTAAATG 
GTGGTTTATA TTTTAACTCC ACCCTGCAAA 
TTCTTTAAAA TGTGAAGCTA GTATTACTTA 
GATTTGGAAG CAATATGCAA GGCACAGTAG 
GTGTTGCATA CAGCCTCTGC TCTCCAGAAC 
CCATGGTGAA ATCCTGCCTG GTGCCTGTTT 
TGCAATAGGG ACCACTCATG ACGAGCCAAG 
GCCCTTTACA GAAGTTTGCC AACCTCTGCT 
CTGTAAGTTT GACCGTTTTC TGTATTCTAC 
TAATTTAAGG ATATGTGCTT TGACATGGGT 
TATATGAGCT ATATGACTTT GAGGTAGTAT 
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16201 CTTAACCTTT TTGAAATTCA TGTTCCCACA TACCTAGCTC AGAATTGTTT 
16251 AGAGAATTAT TGGGACTGTA TGTATGTCTG TTGCCTGGGA GTAGTAAGTG 
16301 TTAACAAGTG AACTATTCAT TGGGTACTGG ATGTTAATTT TGGTTAAGCA 
16351 GCTGATTAAA TGAGGAGACA G I I I I I CTGG TAACCTTGCC CAGTTATTCT 
16401 TTAAACAGTG TAAGAAGTGC AAATAAAGAA GGAAACTAAA ATTTTA GATT 
16451 AAACAAGTTA ATGTGTTTGT AGGGAAATGG AGAGTACTAA ATTTC I I I I I 
16501 CTTACATGTT TTAGACTCAT GATAGTAAAG AGCACCTGGC AATGATGGAA 
16551 CGAATATTAG GACCCATACC ACAACACATG ATTCAGAAAA CAAGGTATGT 
16601 TTTAAGATTC AAGACTTTTG TTGGATATGT GCAATAGCAT ATATTCAAAC 
16651 TACAGAAAAC CCAACGTTGT TGTAATACTG ATTCCAAGGA CTATAGATTT 
16701 TGAC I I I I I I 1 I I I I I I 1 CT GTACTGGAGG TAACTTCTAA CTTCATCTTA 
16751 CTCC I I I I I I I I I I I I I GAG ATGGAGTCTC ACTCTGTCAC CCAGGCTGGA 
16801 GTGCAGTGGC ACGATCTCAG CTCACTGCAG CCTCTGCCTC CTGGGTTCAA 
16851 GTGATTCTTC TGCCTCAGCC CCCTGAGTCG CTGGGATTAC AGGTGCCCAC 
16901 CACTATGCCT GGCTAATTTT TGTAI I I I IA GTAGAGATGG GGTTTCACCG 
16951 TGTTAGTCAG GCTGGTCTTG AACTCCTGAC CTCAGGTGAT CTGCCTGCCT 
17001 TGGCCTCCCA AAGTGCTGGA ATTACAGGTG TGAGTCACTG CACTAGGCCA 
17051 TGI I I I I AAA AACTAATATA ATAAAAAATA TTTACCTTGT GATCTAGTGC 
17101 AGGGGTCCCC AACCCCTCGG AACTGGGCTG TACAACAGGA GGTGAGTGGC 
17151 GGGTGAGTGA GCATTATTGC TGCCTGAGCT GCACCTCCTG TCAGATCAGC 
17201 AGTGGCATTA GATTCTCATA GGAATGTGAA CCCTATTGTG AACTGCGCAC 
17251 GTGAGGGATC TACGTTGCAT GAAGGTTCCT TATGAGAATC TAATGCCTGA 
17301 TGATCTGAGG TGGAAGTTTG ATTCCAAACC ATCATCCCTC GTCCCCGGAT 
17351 CTGCTTCCAT GAAACCGGTC CCTGGTTCCA AAAGGGTTGA GGACCACTGA 
17401 TCTAGTAAAC AAAATGGCTT TTGGG I I I I 1 TTTG I I I I I I I I I I I I I I I I 
17451 AACTCAAGTT TACGTTTGGC ATAAGTGTTT TCTTAGGCGA TGTAAAAATA 
17501 ATACATAGAA TATGGAAAAG . CTTGTGTTTT GGAATCATAT CACTCTAAGT 
17551 GTGAAATTTA TTCTGTCCTT AACCAGCTGT ATATTGTTAG ACAAGGTGGT 
17601 ATTTCCAAAC ACAGCTTCAT CGCAGAAGCC ACCGAGGGAG TTCTTTAAAG 
17651 ATTTCCAGCC CCATTCTAGA TCTAGTGAAA ACAGAATTTT AGGACTGGAT 
17701 CCAGGGGGCC CCTAGTTTTA AGCTGACATT GTTCCATATG TGATAGGAAC 
17751 AACTTAGTTG AGAGACTAAA ACCTCACAGG GTGGAGGATA TGAGGTGTCC 
17801 GATATATAAT TGTTGCTGAG Gl I I I IAAAA ATTGTATGCA TCTATATTAT 
17851 ATAAGTCTAT ACACTTAGAG AGAGCTGCTT TCCATGTCTC CCCTCATGGG 
17901 TGCAGGGTAA AGATACGACT CTTGTTATTT TACTAATCCA GAL I I I I I I I 
17951 I I I I I I CTGT AGAAAACGCA AGTATTTTCA CCATAACCAG CTAGATTGGG 
18001 ATGAACACAG TTCTGCTGGT AGATATGTTA GGAGACGCTG CAAACCGTTG 
18051 AAGGTAAAAG AAAAAAGATT AAAGGTTAAA TAAACCACGT GTTTGCACTA 
18101 TTAATAATTT I I I I I AAAAC AAAAACATTT CTCCCCCAGG AATTTATGCT 
18151 TTGTCATGAT GAAGAACATG AGAAACTGTT TGACCTGGTT CGAAGAATGT 
18201 TAGAATATGA TCCAACTCAA AGAATTACCT TGGATGAAGC ATTGCAGCAT 
18251 CCTTTCTTTG ACTTATTAAA AAAGAAATGA AATGGGAATC AGTGGTCTTA 
18301 CTATATACTT CTCTAGAAGA GATTACTTAA GACTGTGTCA GTCAACTAAA 
18351 CATTCTAATA I I I I I GTAAA CATTAAATTA TTTTGTACAG TTAAGTGTAA 
18401 ATATTGTATG TTTTGTATCA ATAGCATAAT TAACTTGTTA AGCAAGTATG 
18451 GTCTTGATAA TGCATTAGAA AAATTAAAAT TAA I I I I I CT TTTTGAAATT 
18501 ACCAI I I I IA AATACCTTTG AAATATCCTT TGTGTCCAGT GATAAATGTG 
18551 ATTGATCTTG CCTTTTGTAC ATGGAGGTCA CCTCTGAAGT GAI I I I I I II 
18601 GAGTAAAAGG AAATCTTGAC TACTTTATAT TCTTAAAGGA ATATTCTTTA 
18651 TATACTTCAA ATTTAGAACT TAACTTTAAA AG I I I I I CTT CTGTAATTGT 
18701 TGAACGGGTG ATTATTATTA ACTCTAGATA AGCAGGTACT AGAAACCAAA 
18751 ACTCAGAAAA TGTTTACTGT TAGAATTCTA TTAAATTTTA AGTGTTGTAT 
18801 TCI I I I I CAT TGGGTGATGT CAGGGTGATA ACCAGACATT CATGGAAAGG 
18851 CATGCAGTTT GTCCATTGTG ACAGTTTGTT TAATAAAACC ACATACACAC 
18901 TTTATTTAAG ATTAAAATCT AACTGGAAAG TCAGCTTGGA AAATGGACAT 
18951 TTCCAAGTAT GTTTGGTGAG TCACAGATAT AAAAATAGAA ATTCTGATGA 
19001 GAGGTTTCAG I I I I I AATAC CAAGTCCTTA GGAGTCTTAA CATTGGCCAG 
19051 CATCTGTTTA TCAAATGACA TAAATACGTA AACCTATAAG AATTAAGTTT 
19101 ATTAATTAGG CAATTTATGT CTGTGATAAT TCTTACGGGA GAAAGAGGAT 
19151 TTGATTGGAA AGCAGTTTGG GAAGAAAGTG CTGCTGAAAT TTCCAGAATT 
19201 TAATTGATTG GTTACATAAA C I I I I I GACT TCAGCGTTTG TTGTTGTTGT 
19251 TCTTTTACTG TCCTTGTTTT CACATAAAAA CTATATGGAG CCAGGCACAG 
19301 TGGCTCACGC CTGTAATCCC AGCATTTTGG GAGACCGAGG CAGGCGGATC 
19351 ACCTGAGGCC AGGAGTTTGA GACCAGCCTT GCCAACATGG TGAAACCCTG 
19401 TCTCTACTAA AGATACCAAA AAAGTGCTGG GTGTGGTGGC GGGCGCCTGT 
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19451 AATCCCAGCT ACTCTGGAGG 
19501 GGCGGAGTTT GCAGTGAGCT 
19551 GACAGAGCGA GACTCCGTCT 
19601 ACCCGGTATG TGGTAAATTA 
19651 TGCTATGGTT CAGTCAGCCA 
19701 AAGTGCTTAT TTCCAGATAC 
19751 TCATAAGTGC ATTACACAAA 
19801 M I 1 1 I M AG ACTGAGTCTT 
19851 GTGATCTCGG CTCACAGCAA 
19901 TTGACTCAGT TTCCCGAGTG 
19951 CTGCTAATTT TTGTA I I I I I 
20001 GGCTGGTCTC AAACTCCTGA 
20051 AAAGTGCTGG GATTAAAGGC 
20101 TGTATTTTAA ACATAAAATG 
20151 AACCAAGCTT GGTTTTCTAT 
20201 TAAGGCTGTG TTCCTAAAGT 
20251 AAAAGTTTAT TGTAGAATGA 
20301 AGTCCAAATA CAGTGACTTT 
20351 I I I I I I I AAA GAAGCATTTG 
20401 ACCAGTGTAG CCCTACAGAT 
20451 TCATTTTATT TTTCCACTGC 
20501 CCCCAATAAG TGCTTCAAGT 
20551 ACTCAGAGTA GCTGATCTCA 
20601 TATTTCTTAC TAAGAAGTTA 
20651 ATTTACGTTC TTCACAAAGG 
20701 GAATTTCGGG AATTAAAACG 
20751 CTTAGAGGTT AGGGCAGTAC 
20801 CTTCCTTCCC CTCTTCTTTG 
20851 TTTTCCTTGT ATTTCTGAAC 
20901 I 1 1 I 1 I I I I I TTTTGACAGT 
20951 GATTTAGAAT GGCCAATGCA 
21001 TTAATAATTT GATACCTCAT 
21051 TTGTTTCTAG AAATTCCATT 
21101 TTGTAAAGTG CAGCTAAGTA 
21151 TTTCTTGAGG I I I I I I I I CA 
21201 TTGAATTCAA TTGACTCCTG 
(SEQ ID NO: 3) 



CTGAGGCATG AGAATTGCTT GAATCCAGGA 
GAGATTGTGC CACTGCACTC CAGCCTGGGC 
CAAAAAAAGA AAAAACAAAA CAAAACAAAA 
CTTAATTGGG CAAAAGAAAA AAATGTCTGT 
GGTAGGAATA I I I I I I GTTG TAGAATTCCT 
AGGTGAATTT TTGTTAAAAG TATCCCTGTT 
TATTGGAGTT TTATCTGTTT AGGTTTTGTT 
GCTCTGTTGC CCAAGTTGGA GTGCAGTGGC 
CCTTCTTCCT CCTGGGTTCA AGCGATTCTC 
GCTGGGATTA CAGGCATGTG CCACCAGGTC 
AGCAGAGGCA GGGTTTCACC ATGTTGTCGA 
CCTCAAGTGA TCTTCCTGCC TCGGCCTCCC 
ATGAGCCACT ATGCCTGGCT AATCTGTTTA 
CATGGGATTT TCTTGTAGGA CAAATAATGA 
GTTACTTAGG GGCAACATTT GTCAATACAG 
AGACTAGGAG I I I I IAAGAA AGCTGAAACA 
CTGCATACAT TATGTTTAGG CCTCTGATAT 
ATTTCAGAAT AGTTGAACTG TATGTGATAA 
ATGTTTAAAA ACAAGGTTTT TCCTGAGTTT 
TAAGGTGTTT GCTATCCTTT ATTTTCCCCT 
CATTGTACTA CCCAAGCCTC CTGTCCTTTC 
TCCCAAATTA GTGTTTACTT TCTATGAAAA 
GGATATAGGA GGAAAGAAAA ATATTCACAT 
TTGATTGCTA ACCCCCTGTC TCTTCTGAAA 
GTATTTGCTA ATTTCTAGGC CTAATTCATG 
AAACTTTAAA AAATTAGGAT AGATGCAATG 
CTCTGGGATC ATTGAGTGTC TTTTGTCAAC 
AGCTTTCAAG TTCCTACTCT TAATTGCCTT 
TCATTTTGTC AAGTTCCAAG Gl I I I I I I II 
GCCTTGAGCT TCAACACTAA AAGGGAAAAA 
CATGAATCCT TTGTAATTTA GGTA I I I I I C 
AGAATTACTA TTTCTAGAAA TTCCATTGAA 
GAAGTCAAGC TTGA I I I I I I TAGGAGGCAT 
GATTATTTCC AGCTTGCTGC TGCTGCTCAT 
TCCATGCATT CATGAAAATT TTCAGAGTAG 
CTGACAGCAA GGGG 



FEATURES: 



Start: 2007 

Exon: 2007-2059 

intron: 2060-3118 

Exon: 3119-3341 

Intron: 3342-4462 

Exon: 4463-4553 

Intron: 4554-4948 

Exon: 4949-5015 

intron: 5016-8054 

Exon: 8055-8171 

intron: 8172-8258 

Exon: 8259-8425 

Intron: 8426-9007 

Exon: 9008-9102 

intron: 9103-9352 

Exon: 9353-9482 

Intron: 9483-13437 

Exon: 13438-13520 

intron: 13521-16514 

Exon: 16515-16594 

intron: 16595-17962 

Exon: 17963-18053 

intron: 18054-18139 

Exon: 18140-18277 

Stop: 18278 
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SNPs: 



DNA 

Position Major Minor Domain 

76 A - Beyond ORF(5') 

7980 C T intron 

8571 C T intron 

11257 T C A Intron 

11684 C T intron 

13312 T C intron 

17110 T C intron 

17451 C A intron 

20766 G A Beyond ORF(3') 

20914 T - Beyond ORF(3') 



Proten n 
Position 



Major 



Minor 



Context: 



DNA 

Position 
76 



7980 



GCAGAAAAGTATAAAGATGGTAATCTCTGTAGGAAATTAGTCCCCATTATTTAGCTGTAA 

AATTATAATTAAAAA 

[A,-] 

AAAAATCITTGTTTCTAAATCTTTGCGACTGATTATTTCCTGAAAATACACTCCAGGAAG 
AAGCAI I I I I AAGTTAAAGCATGTGAACTCTTATTTCTTGCTACAGGTTCATATTTCTTT 
TTCTAGAGAGTTTGCCAAATTATACAACGTGCTCCTTCATGCTCTCACCAATCTTGGCTG 
TTTTG AAAGG CCAAG AATAATGTTTTG ATT AAACTG AA I I I I I AAATTTCTAACGAATTT 
GTCCGCTGTCATATATTTATTGATCATTTGAACATC I I I I I ATTCTTAGCCTATTTATTA 

TAATATCTGTGCATTGCCTGGCACAGAGTAGGCCTAGCCTGGTAAATGAATGAATGCTTT 
CAACAGTAGCATATCCTA I I I I I GGTTTACATTTGTATATATC I I I I AAAACTGTTGTTG 
TATAAAATGTAATTAAATTTAAAATTCTAGGAGCAAACGTTAAAACTCATAAGTATTAAG 
GGAATTATCACTTCATATAAAGTATTTTATCAAAATGTTTTAAGAAGATGTTATATGGAA 
TCTGCTATAATATGTTCTGAAAGATTATTTTAAATGGCATAGAGGAATTGGTAATTAAGA 

[C f T] 

TATGCTTTAGAGCATAACATGGCTTCAGCTCACTCTTGTACATTTATCA I I I I IATCTTA 
ATTTTAI I I I I AAGGGATGGCATGCATGTAGCAGTGAAAATCGTAAAAAATGTAGGCCGT 
TACCGTGAAGCAGCTCGTTCAGAAATCCAAGTATTAGAGCACTTAAATAGTACTGATCCC 
AATAGTGTCTTGTAAGTATAACTTTCACCTAGGAGCCATCATATTACATGAAATATTCAG 
GTTTCCATAAACTGAATTATTATTTTGCTCTGTTTTAGCCGATGTGTCCAGATGCTAGAA 



8571 GATGCTAGAATGGTTTGATCATCATGGTCATGTTTGTATTGTGTTTGAACTACTGGGACT 
TAGTACTTACGATTTCATTAAAGAAAACAGCTTTCTGCCATTTCAAATTGACCACATCAG 
GCAGATGGCGTATCAGATCTGCCAGTCAATAAATTGTAAGTACACTTGATAAATCTTTAT 
TTTTATTTATTTATTTATTTATTTATTTTGAGACGGAGTCTCGCTCTGTCACCCAGGCTG 
GAGTGCAGTGGCGCTCTCGGGTCCCAGCAAGCTCAGCCTCCCGGGTTCACGCCATTTTCC 

[C,T] 

GCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCACCACCATGCCCAGCTAAI I I I I 
TGTA I I I I I AGTAGAGATGGGATTTCACAGTGTTAGCCAGGATGGTCTCGATCTCCTGAC 
CTTGTGATTGCCCCCCTCGGCCTCCCAAAGTGCTGGGGTTATAGGCGTGAGCCACTGTGC 
ACAGCAATAAATCTTTA I I I I IAAATAI I I I I I ATGTTTGTACCTCCTTAACAATTAAGA 
TAAATCTTTAAGCACCAGAAAACTTG I I I I I ATTATACAAGCTATATATCCAAATGTTGT 



11257 CACCACCACACCCGGCTAA I II I I ATAG I I I I I AGTAGAGATGGGGTTTCACCATGTTAG 

CCAGGCTGGTCTCGAACTCCTGACCTCAGGTGATCCTCCTGCCTCGGCCTCCCAAAGTGC 
TTGGATTACAGGCGTGAGCCACCGCGCCCGGCCAAGGAI I I I I I I I II I I AA I I I I IATG 

I I I I I I ATAACAGAGACAGGGCCTCACCATGTTGCACAGGCTGGTCTCGAACTCCTGGGC 
TTAAGTGATCCGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGTGTGAGCCACCGC 

[T,C,A] 

CCCACCAGAATATGGTCAATCTTATTAATAAAGTTCCAAATGTGGCCAAGCAAGGGATAG 
TACAAATCTGAAATTGGAGTCCCTGGCCTTGAGGAGAAAGAATCAGGAGATTGGGAGAAT 
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AGAAAGGTCCTTTGTTTGTGGAGTGAGGATGAAGGCATAATGCAATTGGAGGGGAAAATG 
TAGTCAGGTGCTAGAGTTGAAGTAGGCAGTTGGCCTTATGTTGGGTATAAAAGCTAACTC 
ATCCAAGAATGAGATGATTTAGAATGGTGTACTGCAGAAGATTACAGTCACCTGGGAAAA 

11684 GTCCTTTGTTTGTGGAGTGAGGATGAAGGCATAATGCAATTGGAGGGGAAAATGTAGTCA 
GGTGCTAGAGTTGAAGTAGGCAGTTGGCCTTATGTTGGGTATAAAAGCTAACTCATCCAA 
GAATGAGATGATTTAGAATGGTGTACTGCAGAAGATTACAGTCACCTGGGAAAAGACTAA 
ATTGGGAGATAGGAGTGGTTGAAAAATAAAAC I I I I I I I I I I I I I I GAGACGCAGTCTTG 
CACTGTCACCCGGGCTGGACTGCAGTGGCACGATCTCGGCTCACTGCAACTTCTGCCTCC 
[C,T] 

GGGTTCAAGCGATTCTCCTGTGTCAGCCTCCCAAGTAGCTGGGCTTACAGGTGCCCGCCA 
CCACGCCCAGCTAAI I I I I IGTAI I I I I AGTAGAGATGGGGTTTCACCACATTGGCCAGG 
CTGGTCTCCAACTCCTGACCTTGTGATTCACCTGCCTTGGCCTCCCAAAGTGCTGGGATT 
ACAGGTGTGAGCO\CCGTGCCTGGTTGAAAAATAAAACTTTTATGAGGTCCAAGCTCTAG 
CATTTACGGATTTTGTATGTGTTAATAGGTAGAAACCATGCTCC^ 

13 3 12 TAGTAATTTCATTCTGAGGATAATTATAAACCTGTATTTGTGCTAATAAAATATAAAAAT 
TCTTGGACTAACC^TGAACTGAGCATAATAATGGTTTTAACAGCAGTGCTCTCCCATTAT 
ATAAACAGTTCAGAGACTATGGAATATTTGCACGAATTGGTTGTATACTTGGAAAATGGT 
AGCCCCCI I I I ATTTTACATAACATGCACCCCTCCCTAGTTAGAATACTGTGTCTTGATG 
TGAGCATATGGACTATGGAGTGTGTTGAATAGCATTTGCTGTAAAACTAGAACTATAAAC 
[T.C] 

CTGAATTTGGTGTCTTATTCTCCCAAATGGGTTCTGTAAAGGGAGCACTCATATAGGGAA 
GGATTTAATGTACTGTCAATTAAAAG I I I I I GCATAGTAAAATGTTTCTATTTGTTTTAA 
AATAGCTTTAGGTTGGTCTCAGCCTTGTGATGTTTGGAGCATAGGTTGCATTCTTATTGA 
ATATTACCTTGGTTTCACAGTCTTTCAGGTACGTGGCTAGTAAATTCCATTTAATAATTC 
ATAACAAATTGTAAACGTTAAAGGTATGCTAAAGTTTTGACTTCCATATTGGAAAATTGC 

17110 CACGATCTCAGCTCACTGCAGCCTCTGCCTCCTGGGTTCAAGTGATTCTTCTGCCTCAGC 
CCCCTGAGTCGCTGGGATTACAGGTGCCCACCACTATGCCTGGCTAA I I I I IGTAI I II I 
AGTAGAGATGGGGTTTCACCGTGTTAGTCAGGCTGGTCTTGAACTCCTGACCTCAGGTGA 
TCTGCCTGCCTTGGCCTCCCAAAGTGCTGGAATTACAGGTGTGAGTCACTGCACTAGGCC 
ATG I I I I I AAAAACTAATATAATAAAAAATATTTACCTTGTGATCTAGTGCAGGGGTCCC 
[T,C] 

AACCCCTCGGAACTGGGCTGTACAACAGGAGGTGAGTGGCGGGTGAGTGAGCATTATTGC 
TGCCTGAGCTGCACCTCCTGTCAGATCAGCAGTGGCATTAGATTCTCATAGGAATGTGAA 
CCCTATTGTGAACTGCGCACGTGAGGGATCTACGTTGCATGAAGGTTCCTTATGAGAATC 
TAATGCCTGATGATCTGAGGTGGAAGTTTGATTCCAAACCATCATCCCTCCTCCCCGGAT 
CTGCTTCCATGAAACCGGTCCCTGGTTCCAAAAGGGTTGAGGACCACTGATCTAGTAAAC 

17451 GGGTGAGTGAGCATTATTGCTGCCTGAGCTGCACCTCCTGTCAGATCAGCAGTGGCATTA 
GATTCTCATAGGAATGTGAACCCTATTGTGAACTGCGCACGTGAGGGATCTACGTTGCAT 
GAAGGTTCCTTATGAGAATCTAATGCCTGATGATCTGAGGTGGAAGTTTGATTCCAAACC 
ATCATCCCTCCTCCCCGGATCTGCTTCCATGAAACCGGTCCCTGGTTCCAAAAGGGTTGA 
GGACCACTGATCTAGTAAACAAAATGGC I I I IGGGI I I I II I I G I I I I I I I I I I I I I I I I 
[C,A] 

ACTCAAGTTTACGTTTGGCATAAGTG I I I I CI I AGGCGATGTAAAAATAATACATAGAAT 
ATGGAAAAGCTTGTGTTTTGGAATCATATCACTCTAAGTGTGAAATTTATTCTGTCCTTA 
ACCAGCTGTATATTCTTAGACAAGGTGGTATTTCCAAACACAGCTTCATCGCAGAAGCCA 
CCGAGGGAGTTCTTTAAAGATTTCCAGCCCCATTCTAGATCTAGTGAAAACAGAATTTTA 
GGACTGGATCCAGGGGGCCCCTAGTTTTAAGCTGACATTGTTCCATATGTGATAGGAACA 

20766 ACTGCCATTGTACTACCCAAGCCTCCTGTCCTTTCCCCCAATAAGTGCTTCAAGTTCCCA 
AATTAGTGTTTACTTTCTATG AAAAACTCAG AGTAG CTG ATCTCAG G ATATAGGAGGAAA 
GAAAAATATTCACATTATTTCTTACTAAGAAGTTATTGATTGCTAACCCCCTGTCTCTTC 
TGAAAATTTACGTTCTTCACAAAGGGTATTTGCTAATTTCTAGGCCTAATTCATGGAATT 
TCGGGAATTAAAACGAAACTTTAAAAAATTAGGATAGATGCAATGCTTAGAGGTTAGGGC 
[G,A] 

GTACCTCTGGGATCATTGAGTG TCTT TTGTCAACCTTCCTTCCCCTC TTCT TTGAGCTTT 
C AAGTTC CT A CTCTT A ATTG C C I I i ( I i CCTTGTATTTCTGAACTCATTTTGTCAAGTTC 
CAAGG I I I I I I I i I I I II I I I I II I I I I GACAGTGCCTTGAGCTTCAACACTAAAAGGGA 
AAAAGATTTAGAATGGCCAATGCACATGAATCCTTTGTAATTTAGGTA I I I I I CTTAATA 
ATTTGATACCTCATAGAATTACTATTTCTAGAAATTCCATTGAATTGTTTCTAGAAATTC 

20914 GAAGTTATTGATTGCTAACCCCCTGTCTCTTCTGAAAATTTACGTTCTTCACAAAGGGTA 
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TTTGCTAATTTCTAGGCCTAATTCATGGAATTTCGGGAATTAAAACGAAACTTTAAAAAA 
TTAGGATAGATGCAATGCTTAGAGGTTAGGGCAGTACCTCTGGGATCATTGAGTGTCTTT 
TGTCAACCTTCCTTCCCCTCTTCTTTGAGCTTTCAAGTTCCTACTCTTAATTGCC I I 1 I I 
TCCTTGTATTTCTGAACTCATTTTGTCAAGTTCCAAGG I I I I I I I I I I I I I I I I I I I I I I 
[T,-] 

GACAGTGCCTTGAGCTTCAACACTAAAAGGGAAAAAGATTTAGAATGGCCAATGCACATG 
AATCCTTTGTAATTTAGGTA I I I I I CTTAATAATTTGATACCTCATAGAATTACTATTTC 
TAGAAATTCCATTGAATTGTTTCTAGAAATTCCATTGAAGTCAAGCTTGA I I I I I I IAGG 
AGGCATTTGTAAAGTGCAGCTAAGTAGATTATTTCCAGCTTGCTGCTGCTGCTCATTTTC 
TTGAGG I I I I I I I I CATCCATGCATTCATGAAAATTTTCAGAGTAGTTGAATTCAATTGA 



Chromosome map: 
Chromosome 5 
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